Goto

Collaborating Authors

 learned image compression


Joint Autoregressive and Hierarchical Priors for Learned Image Compression

Neural Information Processing Systems

Recent models for learned image compression are based on autoencoders that learn approximately invertible mappings from pixels to a quantized latent representation. The transforms are combined with an entropy model, which is a prior on the latent representation that can be used with standard arithmetic coding algorithms to generate a compressed bitstream. Recently, hierarchical entropy models were introduced as a way to exploit more structure in the latents than previous fully factorized priors, improving compression performance while maintaining end-to-end optimization. Inspired by the success of autoregressive priors in probabilistic generative models, we examine autoregressive, hierarchical, and combined priors as alternatives, weighing their costs and benefits in the context of image compression. While it is well known that autoregressive models can incur a significant computational penalty, we find that in terms of compression performance, autoregressive and hierarchical priors are complementary and can be combined to exploit the probabilistic structure in the latents better than all previous learned models. The combined model yields state-of-the-art rate-distortion performance and generates smaller files than existing methods: 15.8% rate reductions over the baseline hierarchical model and 59.8%, 35%, and 8.4% savings over JPEG, JPEG2000, and BPG, respectively. To the best of our knowledge, our model is the first learning-based method to outperform the top standard image codec (BPG) on both the PSNR and MS-SSIM distortion metrics.


Causal Context Adjustment Loss for Learned Image Compression

Neural Information Processing Systems

In recent years, learned image compression (LIC) technologies have surpassed conventional methods notably in terms of rate-distortion (RD) performance. Most present learned techniques are VAE-based with an autoregressive entropy model, which obviously promotes the RD performance by utilizing the decoded causal context. However, extant methods are highly dependent on the fixed hand-crafted causal context. The question of how to guide the auto-encoder to generate a more effective causal context benefit for the autoregressive entropy models is worth exploring. In this paper, we make the first attempt in investigating the way to explicitly adjust the causal context with our proposed Causal Context Adjustment loss (CCA-loss).


Learned Image Compression and Restoration for Digital Pathology

Lee, SeonYeong, Seong, EonSeung, Lee, DongEon, Lee, SiYeoul, Cho, Yubin, Park, Chunsu, Kim, Seonho, Seo, MinKyung, Ko, YoungSin, Kim, MinWoo

arXiv.org Artificial Intelligence

L earned Image C ompressionand R estorationfor Digital Pathology Preprint, compiled A pril 2, 2025 SeonY eong Lee 1, EonSeung Seong 1, DongEon Lee 1, SiY eoul Lee 1, Y ubin Cho 1, Chunsu Park 1, Seonho Kim 1, MinKyung Seo 1, Y oungSin Ko 3, and MinWoo Kim 1,2,* 1 Department of Information Convergence Engineering, Pusan National University, Y angsan, Korea 2 School of Biomedical Convergence Engineering, Pusan National University, Y angsan, Korea 3 Seegene Medical Foundation, Seoul, Korea The first two authors contributed equally to this work. A bstract Digital pathology images play a crucial role in medical diagnostics, but their ultra-high resolution and large file sizes pose significant challenges for storage, transmission, and real-time visualization. To address these issues, we propose CLERIC, a novel deep learning-based image compression framework designed specifically for whole slide images (WSIs). CLERIC integrates a learnable lifting scheme and advanced convolutional techniques to enhance compression e ffi ciency while preserving critical pathological details. Our framework employs a lifting-scheme transform in the analysis stage to decompose images into low-and high-frequency components, enabling more structured latent representations. These components are processed through parallel encoders incorporating Deformable Residual Blocks (DRB) and Recurrent Residual Blocks (R2B) to improve feature extraction and spatial adaptability. The synthesis stage applies an inverse lifting transform for e ffective image reconstruction, ensuring high-fidelity restoration of fine-grained tissue structures. We evaluate CLERIC on a digital pathology image dataset and compare its performance against state-of-the-art learned image compression (LIC) models. Experimental results demonstrate that CLERIC achieves superior rate-distortion (RD) performance, significantly reducing storage requirements while maintaining high diagnostic image quality. Our study highlights the potential of deep learning-based compression in digital pathology, facilitating e fficient data management and long-term storage while ensuring seamless integration into clinical workflows and AI-assisted diagnostic systems. K eywords Learned Image Compression, Deep Learning, Wavelet Transform, Digital Pathology, Whole Slide Image. 1 I ntroduction Digital pathology images serve as fundamental data for various medical applications, playing a crucial role in cancer diagnosis, disease analysis, and treatment planning. These images are typically stored as Whole Slide Images (WSIs), which are characterized by ultra-high resolution (typically 0. 25µ m / px). A single uncompressed WSI can often exceed several gigabytes in size (e.g., 20-30 GB per image), posing significant challenges in terms of storage, transmission, and computational e ffi ciency.


Lightweight Embedded FPGA Deployment of Learned Image Compression with Knowledge Distillation and Hybrid Quantization

Mazouz, Alaa, Chaudhuri, Sumanta, Cagnanzzo, Marco, Mitrea, Mihai, Tartaglione, Enzo, Fiandrotti, Attilio

arXiv.org Artificial Intelligence

Learnable Image Compression (LIC) has shown the potential to outperform standardized video codecs in RD efficiency, prompting the research for hardware-friendly implementations. Most existing LIC hardware implementations prioritize latency to RD-efficiency and through an extensive exploration of the hardware design space. We present a novel design paradigm where the burden of tuning the design for a specific hardware platform is shifted towards model dimensioning and without compromising on RD-efficiency. First, we design a framework for distilling a leaner student LIC model from a reference teacher: by tuning a single model hyperparameters, we can meet the constraints of different hardware platforms without a complex hardware design exploration. Second, we propose a hardware-friendly implementation of the Generalized Divisive Normalization - GDN activation that preserves RD efficiency even post parameter quantization. Third, we design a pipelined FPGA configuration which takes full advantage of available FPGA resources by leveraging parallel processing and optimizing resource allocation. Our experiments with a state of the art LIC model show that we outperform all existing FPGA implementations while performing very close to the original model.


Joint Autoregressive and Hierarchical Priors for Learned Image Compression

Neural Information Processing Systems

Recent models for learned image compression are based on autoencoders that learn approximately invertible mappings from pixels to a quantized latent representation. The transforms are combined with an entropy model, which is a prior on the latent representation that can be used with standard arithmetic coding algorithms to generate a compressed bitstream. Recently, hierarchical entropy models were introduced as a way to exploit more structure in the latents than previous fully factorized priors, improving compression performance while maintaining end-to-end optimization. Inspired by the success of autoregressive priors in probabilistic generative models, we examine autoregressive, hierarchical, and combined priors as alternatives, weighing their costs and benefits in the context of image compression. While it is well known that autoregressive models can incur a significant computational penalty, we find that in terms of compression performance, autoregressive and hierarchical priors are complementary and can be combined to exploit the probabilistic structure in the latents better than all previous learned models.


Reviews: Joint Autoregressive and Hierarchical Priors for Learned Image Compression

Neural Information Processing Systems

Summary This paper extends the autoencoder trained for compression of Balle et al. (2018) with a small autoregressive model. The autoencoder of Balle uses Gaussian scale mixtures (GSMs) for entropy encoding of coefficients, and encodes its latent variables as side information in the bit stream. Here, conditional Gaussian mixtures are used which additionally use neighboring coefficients as context. The authors find that this significantly improves compression performance. Good – Good performance (notably, state-of-the-art MS-SSIM results without optimizing directly on this metric) – Extensive supplementary materials, including rate-distortion curves for individual images – Well written Bad – Incremental, with no real conceptual contributions – Missing related work: There is a long history of conditional Gaussian mixture models for autoregressive modeling of images – including for entropy rate estimation – that is arguably more relevant than other generative models mentioned in the paper: Domke et al. (2008), Hosseini et al. (2010), Theis et al. (2012), Uria et al. (2013), Theis et al. (2015)


GABIC: Graph-based Attention Block for Image Compression

Spadaro, Gabriele, Presta, Alberto, Tartaglione, Enzo, Giraldo, Jhony H., Grangetto, Marco, Fiandrotti, Attilio

arXiv.org Artificial Intelligence

While standardized codecs like JPEG and HEVC-intra represent the industry standard in image compression, neural Learned Image Compression (LIC) codecs represent a promising alternative. In detail, integrating attention mechanisms from Vision Transformers into LIC models has shown improved compression efficiency. However, extra efficiency often comes at the cost of aggregating redundant features. This work proposes a Graph-based Attention Block for Image Compression (GABIC), a method to reduce feature redundancy based on a k-Nearest Neighbors enhanced attention mechanism. Our experiments show that GABIC outperforms comparable methods, particularly at high bit rates, enhancing compression performance.


Region of Interest Loss for Anonymizing Learned Image Compression

Liebender, Christoph, Bezerra, Ranulfo, Ohno, Kazunori, Tadokoro, Satoshi

arXiv.org Artificial Intelligence

The use of AI in public spaces continually raises concerns about privacy and the protection of sensitive data. An example is the deployment of detection and recognition methods on humans, where images are provided by surveillance cameras. This results in the acquisition of great amounts of sensitive data, since the capture and transmission of images taken by such cameras happens unaltered, for them to be received by a server on the network. However, many applications do not explicitly require the identity of a given person in a scene; An anonymized representation containing information of the person's position while preserving the context of them in the scene suffices. We show how using a customized loss function on region of interests (ROI) can achieve sufficient anonymization such that human faces become unrecognizable while persons are kept detectable, by training an end-to-end optimized autoencoder for learned image compression that utilizes the flexibility of the learned analysis and reconstruction transforms for the task of mutating parts of the compression result. This approach enables compression and anonymization in one step on the capture device, instead of transmitting sensitive, nonanonymized data over the network. Additionally, we evaluate how this anonymization impacts the average precision of pre-trained foundation models on detecting faces (MTCNN) and humans (YOLOv8) in comparison to non-ANN based methods, while considering compression rate and latency.


Compressible and Searchable: AI-native Multi-Modal Retrieval System with Learned Image Compression

Luo, Jixiang

arXiv.org Artificial Intelligence

The burgeoning volume of digital content across diverse modalities necessitates efficient storage and retrieval methods. Conventional approaches struggle to cope with the escalating complexity and scale of multimedia data. In this paper, we proposed framework addresses this challenge by fusing AI-native multi-modal search capabilities with neural image compression. First we analyze the intricate relationship between compressibility and searchability, recognizing the pivotal role each plays in the efficiency of storage and retrieval systems. Through the usage of simple adapter is to bridge the feature of Learned Image Compression(LIC) and Contrastive Language-Image Pretraining(CLIP) while retaining semantic fidelity and retrieval of multi-modal data. Experimental evaluations on Kodak datasets demonstrate the efficacy of our approach, showcasing significant enhancements in compression efficiency and search accuracy compared to existing methodologies. Our work marks a significant advancement towards scalable and efficient multi-modal search systems in the era of big data.


Learned Image Compression with Text Quality Enhancement

Lai, Chih-Yu, Tran, Dung, Koishida, Kazuhito

arXiv.org Artificial Intelligence

Learned image compression has gained widespread popularity for their efficiency in achieving ultra-low bit-rates. Yet, images containing substantial textual content, particularly screen-content images (SCI), often suffers from text distortion at such compressed levels. To address this, we propose to minimize a novel text logit loss designed to quantify the disparity in text between the original and reconstructed images, thereby improving the perceptual quality of the reconstructed text. Through rigorous experimentation across diverse datasets and employing state-of-the-art algorithms, our findings reveal significant enhancements in the quality of reconstructed text upon integration of the proposed loss function with appropriate weighting. Notably, we achieve a Bjontegaard delta (BD) rate of -32.64% for Character Error Rate (CER) and -28.03% for Word Error Rate (WER) on average by applying the text logit loss for two screenshot datasets. Additionally, we present quantitative metrics tailored for evaluating text quality in image compression tasks. Our findings underscore the efficacy and potential applicability of our proposed text logit loss function across various text-aware image compression contexts.